Rhea: Automatic Filtering for Unstructured Cloud Storage

نویسندگان

  • Christos Gkantsidis
  • Dimitrios Vytiniotis
  • Orion Hodson
  • Dushyanth Narayanan
  • Florin Dinu
  • Antony I. T. Rowstron
چکیده

Unstructured storage and data processing using platforms such as MapReduce are increasingly popular for their simplicity, scalability, and flexibility. Using elastic cloud storage and computation makes them even more attractive. However cloud providers such as Amazon and Windows Azure separate their storage and compute resources even within the same data center. Transferring data from storage to compute thus uses core data center network bandwidth, which is scarce and oversubscribed. As the data is unstructured, the infrastructure cannot automatically apply selection, projection, or other filtering predicates at the storage layer. The problem is even worse if customers want to use compute resources on one provider but use data stored with other provider(s). The bottleneck is now the WAN link which impacts performance but also incurs egress bandwidth charges. This paper presents Rhea, a system to automatically generate and run storage-side data filters for unstructured and semi-structured data. It uses static analysis of application code to generate filters that are safe, stateless, side effect free, best effort, and transparent to both storage and compute layers. Filters never remove data that is used by the computation. Our evaluation shows that Rhea filters achieve a reduction in data transfer of 2x– 20,000x, which reduces job run times by up to 5x and dollar costs for cross-cloud computations by up to 13x.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comprehensive Analysis of Dense Point Cloud Filtering Algorithm for Eliminating Non-Ground Features

Point cloud and LiDAR Filtering is removing non-ground features from digital surface model (DSM) and reaching the bare earth and DTM extraction. Various methods have been proposed by different researchers to distinguish between ground and non- ground in points cloud and LiDAR data. Most fully automated methods have a common disadvantage, and they are only effective for a particular type of surf...

متن کامل

A multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images

The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...

متن کامل

An Efficient Design and Implementation of an MdbULPS in a Cloud-Computing Environment

Flexibly expanding the storage capacity required to process a large amount of rapidly increasing unstructured log data is difficult in a conventional computing environment. In addition, implementing a log processing system providing features that categorize and analyze unstructured log data is extremely difficult. To overcome such limitations, we propose and design a MongoDB-based unstructured ...

متن کامل

A Vulnerable Scoring through Code-based Cloud Storage System with Sheltered Data Forwarding

Cloud Storage System has a collection of storage servers provides long-standing storage services over the internet. Data privacy becomes a major concern in cloud storage system because user stores their data in third party cloud system. An encryption scheme available for data privacy but it limits the number of functions done in storage system. Building a secure storage system that supports mul...

متن کامل

Presenting a Morphological Based Approach for Filtering The Point Cloud to Extract the Digital Terrain Model

The Digital terrain model is an important geospatial product used as the basis of many practical projects related to geospatial information. Nowadays, a dense point cloud can be generated using the LiDAR data. Actually, the acquired point cloud of the LiDAR, presents a digital surface model that contains ground and non-ground objects. The purpose of this paper is to present a new approach of ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013